Section: New Results
Applications in Social Networks
Participants : Eitan Altman, Konstantin Avrachenkov, Swapnil Dhamal, Giovanni Neglia.
Fairness in Online Social Network Timelines
Facebook News Feed personalization algorithm has a significant impact, on a daily basis, on the lifestyle, mood and opinion of millions of Internet users. Nonetheless, the behavior of such algorithm lacks transparency, motivating measurements, modeling and analysis in order to understand and improve its properties. E. Altman and G. Neglia, together with other researchers from THANES team (E. Hargreaves and D. Menasché from UFRJ, A. Reiffers-Masson from IIsc, and E. Altman) and with the journalist C. Agosti (Univ. of Amsterdam), have proposed a reproducible methodology encompassing measurements, an analytical model and a fairness-based News Feed design. The model leverages the versatility and analytical tractability of time-to-live (TTL) counters to capture the visibility and occupancy of publishers over a News Feed. Measurements from 2018 Italian political election are used to parameterize and to validate the expressive power of the proposed model. Then, we have conducted a what-if analysis to assess the visibility and occupancy bias incurred by users against a baseline derived from the model. Our results indicate that a significant bias exists and it is more prominent at the top position of the News Feed. In addition, we have found that the bias is non-negligible even for users that are deliberately set as neutral with respect to their political views, motivating the proposal of a novel and more transparent fairness-based News Feed design. This is a very recent research direction, but it has already led to 4 publications [36], [27], [28], [12] with a best paper award for [36].
Sampling online social networks
In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. In [4] K. Avrachenkov, together with V.S. Borkar (IIT Bomaby, India), A. Kadavankandy (CentraleSupélec) and J.K. Sreedharan (Purdue Univ., USA), propose two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time. The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time.
Crawling ephemeral content
In [3], K. Avrachenkov and V.S. Borkar (IIT Bombay, India) consider the task of scheduling a crawler to retrieve from several sites their ephemeral content. This is content, such as news or posts at social network groups, for which a user typically loses interest after some days or hours. Thus development of a timely crawling policy for ephemeral information sources is very important. The authors first formulate this problem as an optimal control problem with average reward. The reward can be measured in terms of the number of clicks or relevant search requests. The problem in its exact formulation suffers from the curse of dimensionality and quickly becomes intractable even with a moderate number of information sources. Fortunately, this problem admits a Whittle index, a celebrated heuristics which leads to problem decomposition and to a very simple and efficient crawling policy. The authors derive the Whittle index for a simple deterministic model and provide its theoretical justification. They also outline an extension to a fully stochastic model.
Posting behavior
In [32], E. Altman in collaboration with A. Reiffers-Masson (IISc, India), Y. Hayel and G. Marrel (UAPV) consider a “generalized” fractional program in order to solve a popularity optimization problem in which a source of contents controls the topics of her contents and the rate with which posts are sent to a time line. The objective of the source is to maximize its overall popularity in an Online Social Network (OSN). The authors propose an efficient algorithm that converges to the optimal solution of the Popularity maximization problem.
Recommendation system for OSNs
When a user interested in a service/item, visits an online web-portal, it provides description of its interest through initial search keywords. The system recommends items based on these keywords. The user is satisfied if it finds the item of its choice and the system benefits, otherwise the user explores an item from the list. In [33], E. Altman in collaboration with K. Veeraruna, S. Memon, M. Hanawal and R. Devanand (IEOR IIT Bombay, India), develop algorithms that efficiently utilize user responses to recommended items and find the item of user's interest quickly. The authors first derive optimal policies in the continuous Euclidean space and adapt the same to the space of discrete items.
Opinion dynamics
S. Dhamal, W. Ben-Ameur, T. Chahed (both from Telecom SudParis), and E. Altman have studied the problem of optimally investing in nodes of a social network, wherein two camps attempt to maximize adoption of their respective opinions by the population. In [11], several settings are analyzed, namely, when the influence of a camp on a node is a concave function of its investment on that node, when one of the camps has uncertain information regarding the values of the network parameters, when a camp aims at maximizing competitor's investment required to drive the overall opinion of the population in its favor, and when there exist common coupled constraints concerning the combined investment of the two camps on each node. In [23], the possibility of campaigning in multiple phases is explored, where the final opinion of a node in a phase acts as its initial bias for the next phase. A further intricate setting where a camp's influence on a node also depends on the node's initial bias, is analyzed in [22]. Extensive simulations are conducted on real-world social networks for all the considered settings.
Information diffusion under practical models
S. Dhamal has studied the effectiveness of adaptive seeding in multiple phases under the independent cascade model of information diffusion, in [25]. The effect on the mean and standard deviation of the extent of diffusion is observed, with an explanation of how adaptive seeding reduces uncertainty in diffusion. The other aspects studied are: how the number of phases impacts the effectiveness of diffusion, how the diffusion progresses phase-by-phase, and how to optimally split the total seeding budget across phases. Another study [26] generalizes the linear threshold model to account for multiple product features, and presents an integrated framework for product marketing using multiple channels: mass media advertisement, recommendations using social advertisement, and viral marketing using social networks. An approach for allocating budget among these channels is proposed.